Home > mailing lists

Re: [GENERAL] Determining if "in a text set" - Mailing list pgsql-sql

From	Herouth Maoz
Subject	Re: [GENERAL] Determining if "in a text set"
Date	March 28, 1999 12:23:17
Msg-id	l0311070cb3240c3166d7@[147.233.148.142] Whole thread
Responses	Re: [SQL] Re: [GENERAL] Determining if "in a text set" Re: [GENERAL] Determining if "in a text set"
List	pgsql-sql

Tree view

I am redirecting the discussion to the SQL list, as it is more appropriate
than the GENERAL list.

At 06:41 +0200 on 27/03/1999, Richi Plana wrote:

> I have a table/class with a text field which contains a single word
> (possibly trailed by whitespaces). Given a phrase (a text literal composed
> of words separated by one or more whitespaces), what's the best way to
> SELECT those tuples with fields containing instances that match exactly
> one of the words in the given string?
>
> Example:
>
> Table:
> Tuple#   ...   FieldN
> 1        ...   'MON'
> 2        ...   'TUE'
> 3        ...   'THURS'
> 4        ...   'THU'
> 5        ...   'FRI'
>
> Given phrase:
> 'MON TUE WED THURS'
> Tuple 1 will match
> Tuple 2 will match
> Tuple 3 will match
> Tuple 4 will NOT match
> Tuple 5 will NOT match
>
> Right now, I'm extracting (using strtok(), 8^P) each word from the given
> phrase and using "SELECT * FROM {class} where FieldN LIKE '%{word}%'", but
> it's terribly slow, relies on strtok() to parse words, and Tuple 4 above
> will match.

First, do without the strtok. You could achieve the same result (that also
matches tuple 4) with:
 SELECT * FROM {class} WHERE '{phrase}' LIKE '%' || ( rtrim( FieldN ) || '%' );

Hehe... the strange expression simply cuts off spaces from the right side
of the field, and concatenates % signs on its left and right. So you match
the phrase against the keyword, not the other way around.

It is more readable if you use regular expression rather than 'like':
 SELECT * FROM {class} WHERE '{phrase}' ~ rtrim( FieldN );

Now, turning attention to eliminating the fourth tuple from matching: I'm
not an expert on regular expressions. I think postgres does not support
perl's concept of word boundaries. But if the phrase is supposed to contain
just spaces (ascii 32, if you will) as word separators, I'd match against
the field with a space on the left and nothing on the right, a space on the
right and nothing on the left, spaces on both sides, or standing alone:
 SELECT * FROM {class} WHERE '{phrase}' ~ ' ' || ( rtrim( FieldN ) || '$' )    OR '{phrase}' ~ '^' || ( rtrim( FieldN )
||' ' )    OR '{phrase}' ~ ' ' || ( rtrim( FieldN ) || ' ' )    OR '{phrase}' = rtrim( FieldN );

The equivalent with 'like' is:
 SELECT * FROM {class} WHERE '{phrase}' LIKE '% ' || rtrim( FieldN )    OR '{phrase}' LIKE rtrim( FieldN ) || ' %'
OR'{phrase}' LIKE '% ' || ( rtrim( FieldN ) || ' %' )    OR '{phrase}' = rtrim( FieldN );

If the spaces between the words in your phrase are not just plain spaces
but can also be tabs etc, you will have to use the regexp version, and
write something like '[ \t\r\f\n]' - though I'm less than sure that
postgres's regular expressions support these. Can anyone tell us which
regexp definition postgres uses?

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

pgsql-sql by date:

From: Herouth Maoz
Date: 28 March 1999, 10:46:50
Subject: Re: [SQL] Odd "problem", not sure if there is a solution ....

From: Herouth Maoz
Date: 28 March 1999, 12:43:23
Subject: Re: [SQL] Re: [GENERAL] Determining if "in a text set"

Re: [GENERAL] Determining if "in a text set" - Mailing list pgsql-sql

Previous

Next